(Fwd) WWW Servers on SOLARIS Bandwidth flood on Internet

Darren Reed (avalon@coombs.anu.edu.au)
Wed, 18 Jan 1995 10:15:39 +1100 (EDT)
Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: G.J.W. Hagenaars: "Re: Sol2.x Mouse EXPLOIT info - CORRECTION"
Previous message: Leo Bicknell: "Re: Sol2.x Mouse EXPLOIT info - CORRECTION"
From: "Richard D. Stiennon" <richard@fe3.rust.net>
Message-Id: <9501170644.ZM19531@Fe3.rust.net>
Date: Tue, 17 Jan 1995 06:44:52 -0400
Subject: (Fwd) WWW Servers on SOLARIS Bandwidth flood on Internet

I thought the following summary of a SLIP/PPP caused problem would be of
interest to other ISPs.

It was discovered this past weekend by contributers to the portmaster-users
mail list.

-Richard Stiennon
 RustNet



--- Forwarded mail from Ed Goldgehn <edg@OCN.Com>

To: Portmaster Users Group <portmaster-users@msen.com>


It has come to the attention of a few members of Livingston Enterprises
PortMasters Users Group on Internet that a kernel bug in Solaris 2.X is
causing an unidentified (but potentially significant) number of unnecessary
data packets to be placed on Internet by WWW servers.  There has been an
unconfirmed report that a similar error exists on SGI machines as well.

The nature of the kernel bug is most often exposed with httpd daemons (WWW
Servers) when the Solaris kernel does not recognize or receive a session
disconnect when a remote user terminates their session with the WWW server.
When this occurs, the remote user's session stays forever active on the WWW
server and will continuously send data packets out over the Internet.

Under certain circumstances (yet to be identified), a session state of
CLOSE-WAIT with a non-zero Send-queue and Send Window and can exist.  This
specific state currently results in a looped state that sends out 1396 byte
packets over the Internet every *56 seconds* (after max exponential back-off
and based on the value of the parameter tcp_rexmit_interval_max).  This
state will continue to exist until the WWW server is reset or it receives an
RST from the client.

According to interpretation of tcp code, it appears that a window close on
the side of the PC will cause the connection never to time out.  Solaris 2.x
nullifies the window close and reopens it to do a window probe.  This
situation has been observed on both the latest and earlier OS revisions.

*************************************************************************
This condition is likely to result from any standard dial-up SLIP or PPP
account available from any Internet Service Provider (ISP).

This condition is likely to impact the bandwidth availability of any ISP
regardless of what machines or OS the ISP uses.  The more dial-ip SLIP/PPP
users an ISP has, the more likely they are to be effected by this problem.
*************************************************************************

Since the data packets are initiated at the Server end of the Internet, and
the packets are being sent to disconnected client sessions to any other end
of the Internet, nearly all ISP's are likely to be receiving incoming
packets on their backend connections to the net which are simply using up
bandwidth.  The destination of these packets are most likely previously
disconnected IP addresses from dial-in users.

Initial analysis is that this condition exists when a combination of an
ungraceful client disconnect is followed by a client dropping of the net
where no RST is sent by the client. This may also be related to the use of
_broken_ protocol (IP) stacks on personal computers by any number of dial-up
users that do not send a RST (these appear to be the most common).

An indicatation of the possible extent of this situation on the whole of
Internet is that this problem was originally identified with Chameleon 4.00
running on a PC using Netscape 1.0N as the client software.

Further investigation into this situation to identify more specifically what
conditions allow this problem to occur are taking place.  If you want
additional information, or wish to provide additional technical input, a
majordomo list has been established.

To subscribe, send a mail message to:
       majordomo@destek.net
with "subscribe solwww-bug" in the body of the message.

To submit something for reflection on the list, send mail to
       solwww-bug@destek.net.
---------------------------------------------------------------------------
Thanks to the following individuals for their part in identifying and
determining this problem:

Paul Lind <paul@cruz.com> - first identified the problem and reported it to
the PortMaster Users Group

Guido van Rooij <guido@iaehv.nl> - traced the problem down to a TCP/IP stack
related issue that got the ball rolling

Cor Bosman <cor@xs4all.net> - traced the problem down to Solaris

Casper Dik <casper@fwi.uva.nl> - the Solaris expert that identified the
specific socket state and other Solaris specific technical information that
formed the basis of this post
----------------------------------------------------------------------------
Casper has provided the following command to change the kernel parameter to
600 second intervals.  Mathematically, this setting will reduce retransmits
of this nature by 90%.

(ndd -set /dev/tcp tcp_rexmit_interval_max 600000)

The 600000 parameter is in milliseconds.

----------------------------------------------------------------------------
Also, thanks to Marc Evans <marc@destek.net> for setting up the listserv to
help all of us communicate effectively about this problem.

**************************************************************************
Ed Goldgehn                                     E-Mail:  edg@ocn.com
Sr. Vice President                              Voice:   (404) 919-1561
Open Communication Networks, Inc.               Fax:     (404) 919-1568
**************************************************************************



--- End of forwarded mail from Ed Goldgehn <edg@OCN.Com>
Next message: G.J.W. Hagenaars: "Re: Sol2.x Mouse EXPLOIT info - CORRECTION"
Previous message: Leo Bicknell: "Re: Sol2.x Mouse EXPLOIT info - CORRECTION"